METHOD AND APPARATUS FOR DATA ANALYSIS 



BACKGROUND OF THE INVENTION 



Data analysis is used in many different areas, such as data mining, statistical 
analysis, artificial intelligence, machine learning, and process control to provide 
information that can be applied to different environments. Usually this analysis is 
performed on a collection of data organised in a database. With large databases, 
computations required for the analysis often take a long time to complete. 

Databases can be used to determine relationships between variables and 
provide a model that can be used in the data analysis. These relationships allow the 
value of one variable to be predicted in terms of the other variables. Minimizing 
computational time is not the only requirement for successful data analysis. 
Overcoming rapid obsolescence of models is another major challenge. 

Currently tasks such as prediction of new conditions, process control, fault 
diagnosis and yield optimization are done using computers or microprocessors 
directed by mathematical models. These models generally need to be "retrained" or 
"recalibrated" frequently in dynamic environments because changing environmental 
conditions render them obsolete. This situation is especially serious when very large 
quantities of data are involved or when large changes to the models are required over 
short periods of time. Obsolescence can originate from new data values being 
drastically different from historical data because of an unforeseen change in the 
environment of a sensor, one or more sensors becoming inoperable during operation 
or new sensors being added to a system for example. 

In real-world applications, there are several other requirements that often 
become vital in addition to computational speed and rapid model obsolescence. For 
example, in some cases the model will need to deal with a stream of data rather than a 
static database. Also, when databases are used they can rapidly outgrow the available 
computer storage available. Furthermore, existing computer facilities can become 



insufficient to accomplish model re-calibration. Often it becomes completely 
impractical to use a whole database for re-calibration of the model. At some nsk, a 
sample is taken from the database and used to obtain the re-calibrated model. In 
developing models, "scenario testing" is often used. That is, a variety of models need 
5 to be tried on the data. Even with moderately sized databases this can be a processing 
intensive task. For example, although combining variables in a model to form a new 
model is very attractive from an efficiency viewpoint (termed here "dimension 
reduction"), the number of possible combinations combined with the data processing 
usually required for even one model, especially with a large database, makes the idea 
10 impractical with current methods. Finally, often models are used in situations where 
they must provide an answer very quickly, sometimes with inadequate data. In credit 
scoring for example, a large number of risk factors can affect the credit rating and the 
interviewer wishes to obtain the answer from a credit assessment model as rapidly as 
possible with a minimum of data. Also, in medical diagnosis, a doctor would like to 
15 converge on the solution with a minimum of questions. Methods which can request 
the data needed based on maximizing the probability of arriving at a conclusion as 
quickly as possible (termed here "dynamic query") would be very useful in many 
diagnostic applications. 

Finally, mobile applications are now becoming very important in technology. 
A method of condensing the knowledge in a large database so that it can be used with 
a model in a portable device is highly desirable. 

This situation is becoming increasingly important in an extremely diverse 
25 range of areas ranging from finances to health care and from sports forecasting to 
retail needs. 

FIELD OF THE INVENTION 
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The present invention relates to a method and apparatus for data analysis. 
DESCRIPTION OF THE PRIOR ART 



6 

The primary focus in the previous art has been to focus upon reducing 
computational time. Recent deve.optnents in database technology are beginntng to 
emphasize "automatic summary tables" ("AST's") that contain pre-compu.ed 

„ . At k.„ Tk«, AST's orovide a "matenaltzed 
quantises needed by "queries" to the database. These AST s pro 

5 view" of the data and greatly increase the speed of response ,0 quenes. E ffi tendy 
updating the AST's with new data records, as the new data becomes avatlab e for the 
database has been the subject of many publication, Initially only very s.mple quenes 
were considered. Most recently incrementally updating an AST in accordance wtth a 
method of updating AST's that applies to al. "aggregate functions" has been 
,0 proposed. However, although the AST's speed up the response to queries, they are 
1, very extensive compilations of data and therefore increment re-compu,a„o„ ts 
generally a necessity for their maintenance. Pa,pa„as e, al. proposed what they lent, 
as "the firs," general algorithm to efficient* re-compute on,y the groups m the AST 
whtch need to be updated in order ,o reply to the query. However, .heir method ts a 
15 very tnvolved one. 1, includes a considerable amount of worlc ,o select the groups dta, 
are to be updaled. Their experiments indicate mat their method runs tn 20% to 60/. 
of the time required for a "full refresh" of the AST. There is increasing interest tn 
using AST's to respond to quenes that originate from On-line Analyttca, Processtng 
("OLAP"). These can involve standard statistical or data-mining methods. 

20 

Chen e« al. examined the problem of applying OLAP to dynamic rather than 
stauc situations. In particular, they were interested in multi-dimens.onal regresston 
analysis of time-series data streams. They recognized that it shou,d be posstble to use 
on.y a small number of pre-compu.ed quan.it.es ramer man all of .he da.a. However, 
25 the algorithms that ,hey propose are very involved and constrained in then uttltty. 

U S Patent 6,553,366 shows how great economies of data storage 
requirements and time can be obtained by storing and using various "scalable data 
mining functions" computed from a relational database. This is the most recent 
30 version of the "automatic summary table" idea. 

Thus, although the prior art has recognized that pre-computing quantities 
needed in subsequent modeling calculates saves time and data storage, the methods 
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developed fail to satisfy some or all of the other requirements mentioned above. 
Often they can add records but cannot remove records to their "static" databases. 
Adding new variables or removing variables "on the fly" (in real time) is not 
generally known. They are not used to combine databases or for parallel processing. 
Scenario testing is very limited and does not involve dimension reduction. Dynamic 
query is not done with static decision trees being commonplace. Methods are 
generally embedded in large office information systems with so many quantities 
computed and so many ties to existing interfaces that portability is challenging. 

It is therefore an object of the present invention to provide a method of and 
apparatus for data analysis that obviates or mitigates some of the above 
disadvantages. 

SUMMARY OF THE INVENTION 
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In one aspect, the present invention provides a "knowledge entity" that may 
be used to perform incremental learning. The knowledge entity is conveniently 
represented as a matrix where one dimension represents independent variables and the 
other dimension represents dependent variables. For each possible pairing of 

20 variables, the knowledge entity stores selected combinations of either or both of the 
variables. These selected combinations are termed the "knowledge elements" of the 
knowledge entity. This knowledge entity may be updated efficiently with new records 
by matrix addition. Furthermore, data can be removed from the knowledge entity by 
matrix subtraction. Variables can be added or removed from the knowledge entity by 

25 adding or removing a set of cells, such as a row or column to one or both dimensions. 

Preferably the number of joint occurrences of the variables is stored with the 
selected combinations. 



30 



Exemplary combinations of the variables are the sum of values of the first 
variable for each joint occurrence, the sum of values of the second variable for each 
joint occurrence, and the sum of the product of the values of each variable. 



In one further aspect of the present invention, there is provided a method of 
performing a data analysis by collecting data in such the knowledge entity and 
utilising it in a subsequent analysis. 

According to another aspect of the present invention, there is provided a 
process modelling system utilising such the knowledge entity. 

According to other aspects of the present invention, there is a provided either a 
learner or predictor using such the knowledge entity. 

The term "analytical engine" is used to describe the knowledge entity together 
with the methods required to use it to accomplish incremental learning operations, 
parallel processing operations, scenario testing operations, dimension reduction 
operations, dynamic query operations and/or distributed processing operations. These 
methods include but are not limited to methods for data collecting, management of the 
knowledge elements, modelling and use of the modelling (for prediction for example). 
Some aspects of the management of the knowledge elements may be delegated to a 
conventional data management system (simple summations of historical data for 
example). However, the knowledge entity is a collection of knowledge elements 
specifically selected so as to enable the knowledge entity to accomplish the desired 
operations. When modeling is accomplished using the knowledge entity it is referred 
to as "intelligent modeling" because the resulting model receives one or more 
characteristics of intelligence. These characteristics include: the ability to 
immediately utilize new data, to purposefully ignore some data, to incorporate new 
variables, to not use specific variables and, if necessary, to do be able to utilize these 
characteristics on-line (at the point of use) and in real time. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Embodiments of the invention will now be described by way of example only 
with reference to the accompanying drawings in which: 



Figure 1 is a schematic diagram of a processing apparatus; 



Figure 2 is a representation of a controller for the processing apparatus of 
Figure 1; 

5 Figure 3 is a schematic of a the knowledge entity used in the controller of 

Figure 2; 

Figure 4 is a flow chart of a method performed by the controller of Figure 2; 

1 0 Figure 5 is another flow chart of a method performed by the controller of 

Figure 2; 

Figure 6 is a further flow chart of a method performed by the controller of 
Figure 2; 

15 

Figure 7 is a yet further flow chart of a method performed by the controller of 
Figure 2; 

Figure 8 is a still further flow chart of a method performed by the controller of 
20 Figure 2; 

Figure 9 is a schematic diagram of a robotic arm; 
Figure 10 is a schematic diagram of a Markov chain; 

25 

Figure 1 1 is a schematic diagram of a Hidden Markov model; 

Figure 12 is another schematic diagram of a Hidden Markov model. 

30 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

To assist in understanding the concepts embodied in the present invention and 
to demonstrate the industrial applicability thereof with its inherent technical effect, a 



first embodiment will describe how the analytical engine enables application to the 
knowledge entity of incremental learning operations for the purpose of process 
monitoring and control. It will be appreciated that the form of the processing 
apparatus is purely for exemplary purposes to assist in the explanation of the use of 
the knowledge entity shown in Figure 3, and is not intended to limit the application to 
the particular apparatus or to process control environments. Subsequent embodiments 
will likewise illustrate the flexibility and general applicability in other environments. 

Referring therefore to Figure 1, a dryer 10 has a feed tube 12 for receiving 
wet feed 34. The feed tube 12 empties into a main chamber 30. The main chamber 30 
has a lower plate 14 to form a plenum 32. An air inlet 18 forces air into a heater 16 to 
provide hot air to the plenum 32. An outlet tube 28 receives dried material from the 
main chamber 30. An air outlet 20 exhausts air from the main chamber 32. 

The dryer 10 is operated to produce dried material, and it is desirable to 
control the rate of production. An exemplary operational goal is to produce 100 kg of 
dried material per hour. 

The dryer receives wet feed 34 through the feed tube 12 at an adjustable and 
observable rate. The flow rate from outlet tube 28 can also be monitored. The flow 
rate from outlet tube 28 is related to operational parameters such as the wet feed flow 
rate, the temperature provided by heater 16, and the rate of air flow from air inlet 18. 
The dryer 10 incorporates a sensor for each operational parameter, with each sensor 
connected to a controller 40 shown in detail in Figure 2. The controller 40 has a data 
collection unit 42, which receives inputs from the sensors associated with the wet feed 
tube 12, the heater 16, the air inlet 18, and the output tube 28 to collect data. 

The controller 40 has a learner 44 that processes the collected data into a 
knowledge entity 46. The knowledge entity 46 organises the data obtained from the 
operational parameters and the output flow rate. The knowledge entity 46 is initialised 
to notionally contain all zeroes before its first use. The controller 40 uses a modeller 
48 to form a model of the collected data from the knowledge entity 46. The controller 
40 has a predictor 50 that can set the operational parameters to try to achieve the 
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operational goal. Thus, as the controller operates the dryer 10, it can monitor the 
production and incrementally learn a better model. 

The controller 40 operates to adjust the operational parameters to control the 
rate of production. Initially the dryer 10 is operated with manually set operational 
parameters. The initial operation will produce training data from the various sensors, 
including output rate. 

The data collector 42 receives signals related to each of the operational 
parameters and the output rate, namely a measure of the wet feed rate from the wet 
feed tube 12, a measure of the air temperature from the heater 16, a measure of the air 
flow from the air inlet 1 8, and a measure of the output flow rate from the output tube 
28. 

The learner 44 transforms the collected data into the knowledge entity of 
Figure 3 as each measurement is received. As can be seen in Figure 3, the knowledge 
entity 46 is organised as an orthogonal matrix having a row and a column for each of 
the sensed operating parameters. The intersection of each row and column defines a 
cell in which a set of combinations of the variable in the respective row and column is 
accumulated. 

In the embodiment of Figure 3, for each pairing of variables, a set of four 
combinations is obtained. The first combination, n Uj is a count of the number of joint 
occurrences of the two variables. The combination ^X, represents the total of all 
measurements of the first variable Xj, which is one of the sensed operational 
parameters. The second quantity ]T A", records the total of all measurements of the 
second variable Xj, which is another of the sensed operational parameters. Finally, 
Yj X i X j records the total of the products of all measurements of both variables. It is 
noted that the summations are over all observed measurements of the variables. 

These combinations are additive, and accordingly can be computed 
incrementally. For example, given observed measurements [3, 4, 5, 6] for the variable 



X i5 then YX t = 3 + 4 + 5 + 6 = 18 . If the measurements are subdivided into two 
collections of observed measurements [3, 4] and [5, 6], for example from sensors at 
two different locations, then , = 7 and = 11 so = £ + • 

[tT] [5,6] [3,4,5,6] [3,4] [5,6] 

The nature of the subdivision is not relevant, so the combination can be computed 
5 incrementally for successive measurements, and two collections of measurements can 
be combined by addition of their respective combinations. 

In general, the combinations of parameters accumulated should have the 
property that given a first and second collection of data, the value of the combination 
10 of the collections may be efficiently computed from the values of the collections 

themselves. In other words, the value obtained for a combination of two collections of 
data may be obtained from operations on the value of the collections rather than on 
the individual elements of the collections. 

15 It is also recognised that the above combinations have the property that given 

a collection of data and additional data, which can be combined into an augmented 
collection of data, the value of the combination for the augmented collection of data is 
efficiently computable from the value of the combination for the collection of data 
and the value of the combination for the additional data. This property allows 

20 combination of two collections of measurements. 

An example of data received by the data collector 42 from the dryer of Figure 
1 in four separate measurements is as follows: 



Measurement 


Wet Feed Rate 


Air Temperature | 


Air Flow 


Dry Output Rate 


1 


10 


30 


110 


2 


2 


15 


35 


115 


3 


3 


5 


40 "~1 


120 


1.5 


4 


15 


50 


140 


6 



25 



10 



With the measurements shown above in Table 1, measurement 1 is 
transformed into the following record represented as an orthogonal matrix: 



Measurement 1 


Wet Feed Kate 




Air Flow 


Dry Output Rate 


Wet Feed Rate 


1 = n,i 


1 


1 


1 




10 = x, 


10 


10 


10 




10 = x 2 


30 


110 


2 




100 = X1X2 




1100 


20 


Air Temperature 


1 


1 


1 


1 




30 


30 


30 


30 




10 


30 


no 


2 




300 




3300 


60 


Air Flow 


1 


11 


1 


1 




110 


no 


110 


110 




10 


30 


110 


2 




1100 


3300 


12100 


220 


Dry Output Rate 


1 


1 


1 


1 




2 


2 


2 


2 




10 


30 


110 


2 




20 


60 


220 


4 



This measurement is added to the knowledge entity 46 by the learner 42. Each 
subsequent measurement is transformed into a similar table and added to the 
knowledge entity 46 by the learner 42. 
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For example, upon receipt of the second measurement, the cell at the 
intersection of the wet feed row and air temperature column would be updated to 



contain: 





Air Temperature 


Wet Feed Rate 


1+1=2 




10+15 =25 




30 + 35 = 65 




300 + 525 = 825 


Table 3 



5 Successive measurements can be added incrementally to the knowledge entity 

46 since the knowledge entity for a new set of data is equal to the sum of the 
knowledge entity for an old set data with the knowledge entity of the additional data. 
Each of the combinations F used in the knowledge entity 46 have the exemplary 
property that F{A uB) = F{A) + F(B) for sets A and B. Further properties of the 

1 0 knowledge entity 46 will be discussed in more detail below. 

As data are collected, the controller 40 accumulates data in the knowledge 
entity 46 which may be used for modelling and prediction. The modeller 48 
determines the parameters of a predetermined model based on the knowledge entity 
15 46. The predictor 50 can then use the model parameters to determine desirable 
settings for the operational parameters. 

After the controller 40 has been trained, it can begin to control the dryer 10 
using the predictor 50. Suppose that the operator instructs the controller 40 through 
20 the user interface 52 to set the production rate to 1 00 kg/h by varying the air 
temperature at heater 16, and that the appropriate control method uses a linear 
regression model. 

The modeller 48 computes regression coefficients as shown in Figure 4 
25 generally by the numeral 1 00. At step 1 02, the modeller computes a covariance table. 
Covariance between two variables Xj and Xj may be computed as 



12 



10 



15 



' n 'J since each of these terms is one of the 

Covar u =— — 

combinations stored in" the knowledge entity 46 at the intersection of row i and 
column j. computation of the covariance for each pair of variables is done wuh two 
div 1S1 ons and one subtraction. When i = J, the covariance is equal to the vanance, ,e. 

Covar, j = Var, = Var, . The modeller 48 uses this relationship to compute the 

covariance between each pair of variables. 

Then at step 1 04, the modeller 48 computes a correlation table. The correlation 

frlQC p _ Covar iJ _ sinceeachof 
between two variables X, and Xj may be computed as K tJ - jy-y~T 

these terms appears in the covanance table obtamed from the knowledge entity 46 at 
step 102 the correlation coefficient can be computed with one multiplication, one 
square root, and one division. The modeller 48 uses this relationship to compute the 
correlation between each pair of variables. 

At step 106, the operator selects a variable Y, for example X 4 , to model 
through the user interface 52. At step 107, the modeller 48 computes fi = So- 
using the entries in the correlation table. 

At step 108, the modeller 48 first computes the standard deviation s y of the 
dependent variable Y and the standard deviation s, of independent variables X, 
Conveniently, the standard deviations s y = JtoT, and = ^ are computed 
using the entries from the covariance table. The modeller 48 then computes the 

coefficients b } = fij 



\ s jj 



25 At step 109, the modeller 48 computes an intercept 

a = T 4 - b x Y x - b 2 T 2 - b,Y z . The modeller 48 then provides the coefficients a, b,, 
b 2 , b 3 to the predictor 50. 
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The predictor 50 can then estimate the dependent variable as 
Y = a + b i Y l +b 2 X~ 2 +b 3 X,. 

The knowledge entity shown in Figure 3 provides the analytical engine . 
significant flexibility in handling varying collections of data. Referring to Figure 5 a 
method of amalgamating knowledge from another controller is shown generally by 
the numeral 1 10. The controller 40 first receives at step 1 12 a new knowledge entity 
from another controller. The new knowledge entity is organised to be of the same 
form as the existing knowledge entity 46. This new knowledge entity may be based 
upon a similar process in another factory, or another controller in the same factory, or 
even standard test data or historical data. The controller 40 provides at step 1 14 the 
new knowledge entity to learner 44. Learner 44 adds the new knowledge to the 
knowledge entity 46 at step 1 1 6. The new knowledge is added by performing a matnx 
addition (i.e. addition of similar terms) between the knowledge entity 46 and the new 
knowledge entity. Once the knowledge entity 46 has been updated, the model is 
updated at step 1 1 8 by the modeller 48 based on the updated knowledge entity 46 

In some situations it may be necessary to reverse the effects of amalgamating 
knowledge shown in Figure 5. In this case, the method of Figure 6 may be used to 
remove knowledge. Referring therefore to Figure 6, a method of removing knowledge 
from the knowledge entity 46 is shown generally by the numeral 120. To begm, at 
step 122 the controller 40 accesses a stored auxiliary knowledge entity. Tins may be a 
record of previously added knowledge from the method of Figure 5. Alternately, 
this may be a record of the knowledge entity at a specific time. For example, it may 
be desirable to eliminate the knowledge added during the first hour of operations, as it 
may relate to startup conditions in the plant which are considered irrelevant to future 
modelling. The stored auxiliary knowledge entity has the same form as the knowledge 
entity 46 shown in Figure 3. The controller 40 provides the auxiliary knowledge entity 
to the learner 44 at step 124. The learner 44 at step 126 then removes the auxiliary 
knowledge from the knowledge entity 46 by subtracting the auxiliary knowledge 
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entity from knowledge entity 46. Finally at step 128, the model is updated with the 
modified knowledge entity 46. 

To further refine the modelling, an additional sensor may be added to the dryer 
5 10. For example, a sensor to detect humidity in the air inlet may be used to consider 
the effects of external humidity on the system. In this case, the model may be updated 
by performing the method shown generally by the numeral 130 in Figure 7. First a 
new sensor is added at step 132. The learner 44 then expands the knowledge entity by 
adding a row and a column. The combinations in the new row and the new column 
10 have notional values of zero. The controller 44 then proceeds to collect data at step 
136. The collected data will include that obtained from the old sensors and that of the 
new sensor. This information is learned at step 138 in the same manner as before. The 
knowledge entity 46 in the analytical engine can then be used with the new sensor to 
obtain the coefficients of the linear regression using all the sensors including the new 
1 5 sensor. It will be appreciated that since the values of V in the new row and column 
initially are zero, that there will be a significant difference between the values of V 
in the new row and column and in the old rows and columns. This difference reflects 
that more data has been collected for the original rows and columns. It will therefore 
be recognised that provision of the value of V contributes to the flexibility of the 
20 knowledge entity. 

It may also be desirable to eliminate a sensor from the model. For example, it 
may be discovered that air flow does not affect the output speed, or that air flow may 
be too expensive to measure. The method shown generally as 140 in Figure 7 allows 

25 an operational parameter to be removed from the knowledge entity 46. At step 142, an 
operational parameter is no longer relevant. The operational parameter corresponds to 
a variable in the knowledge entity 46. The learner 44 then contracts the knowledge 
entity at step 144 by deleting the row and column corresponding to the removed 
variable. The model is then updated at step 146 to obtain the linear regression 

30 coefficients for the remaining variable to eliminate use of the deleted variable. 

It will be noted in each of these examples that the updates is accomplished 
without requiring a summing operation for individual values of each of the previous 
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records. Similarly subtraction is performed without requiring a new summing 
operation for the remaining records. . No substantial re-training or re-calibration is 
required. 

5 DISTRIBUTED AND PARALLEL DATA PROCESSING 

A particularly useful attribute of the knowledge entity 46 in the analytical 
engine is that it allows databases to be divided up into groups of records with each 
group processed separately, possibly in separate computers. After processing, the 
0 results from each of these computers may be combined to achieve the same result as 
though the whole data set had been processed all at once in one computer. The 
analytical engine is constructed so as to enable application to the knowledge entity of 
such parallel processing operations. This can achieve great economies of hardware 
and time resources. Furthermore, instead of being all from the one database, some of 
15 these groups of records can originate from other databases. That is, they may be 
"distributed" databases. The combination of diverse databases to form a single 
knowledge entity and hence models which draw upon all of these databases is then 
enabled. That is, the analytical engine enables application to the knowledge entity of 
distributed processing as well as parallel processing operations. 

20 

As an illustration, if the large database (or distributed databases) can be 
divided into ten parts then these parts may be processed on computers 1 to 10 
inclusive, for example. In this case, these computers each process the data and 
construct a separate knowledge entity. The processing time on each of these 

25 computers depends on the number of records in each subset but the time required by 
an eleventh computer to combine the records by processing the knowledge entity is 
small (usually a few milliseconds). For example, with a dataset with 1 billion records 
that normally requires 10 hours to process in a single computer, the processing time 
can be decreased to 1 hour and a few seconds by subdividing the dataset into ten 

30 parts. 

To demonstrate this attribute, the following example considers a very small 
dataset of six records and an example of interpretation of dryer output rate data from 
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three dryers. If, for example, the output rate from the third dryer is to be predicted 
from the output rate from the other two dryers then an equation is requfred relatmg U 
to these other two output rates. The data is shown in the table below where X h X 2 and 
X 3 represent the three output rates. The sample dataset with six records and three 
variables is set forth below at Table 4. 



X, 


x 2 


x 3 


2 


3 


5 


3 


4 


7 


1 


1 


3 


2 


3 


6 


4 


4 


8 


3 


5 


7 



Table 4 



With such a small amount of data it is practical to use multiple linear 
1 0 regression to obtain the needed relationship: 

Multiple linear regression for the dataset shown in Table 4 provides the 

relationship: 

X 3 = 1.652 + 1.174 * X, + 0.424 * X 2 

1 5 However, if this dataset consisted of a billion records instead of only six then 

multiple linear regression on the whole dataset at once would not be practical. The 
conventional approach would be to take only a random sample of the data and obtam 
a multiple linear regression model from that, hoping that the resulting model would 

20 represent the entire dataset. 

Using the knowledge entity 46, the analytical engine can use the entire dataset 
for the regression model, regardless of the size of the data set. This can be illustrated 
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using 
groups 



only the six records shown as follows and dividing the dataset into only three 



Step 1: Divide the dataset to three subsets with two records in each, and 
compute a knowledge entity for each subset. The data in subset 1 has the form shown 
below in Table 5. 

Subset 1: 



10 



X, 


x 2 


x 3 


2 


3 


5 


3 


4 


7 



From the data in Table 5 above, a knowledge entity I (Table 6) is calculated 
for subset 1 



(Table 5) using a first computer. 





x, 


x 2 


x 3 






2 


2 


2 




x, 


5 


5 


5 






5 


7 


12 






13 


18 


31 






2 


2 


2 




x 2 


7 


7 


7 






5 


7 


12 






18 


25 


43 






2 


2 


2 




x 3 


, 12 


12 


12 





18 





5 


7 


12 




31 


43 


74 



Table 6 



As described above, the knowledge entity 46 is built by using the basic units 
which includes an input variable Xj an output variable X, and a set of combinations 
indicated as W i} , as shown in Table 7: 





Xj 


Xi 


Wij 



10 



15 



Where W i} includes one or more 



of the following four basic elements: 



N u is the total number of joint occurrence of two variables 

□ Xi is the sum of variable Xi 

□ Xj is the sum of variable X } 

□ Xi Xj is the sum of multiplication of variable X t and Xj 

In some applications it may be advantageous to include additional knowledge 
elements for specific calculation reasons. For example: □ X 3 , □ X* andD (X Xjf can 
generally be included in the knowledge entity in addition to the four basic elements 
mentioned above without adversely affecting the intelligent modeling capabilities. 
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The data in subset 2 has the form shown below in Table 8. 
Subset 2: 



X, 


x 2 


x 3 


1 


1 


3 



19 



Table 8 



A knowledge entity II (Table 9) is calculated for subset 2 (Table 8) using 
second computer. 



5 





x t 


x 2 


x 3 




2 


2 


2 




3 


3 


3 




3 


4 


9 




5 


7 


15 




2 


2 


2 


x 2 


4 


4 


4 




3 


4 


9 




7 


10 


21 




2 


2 


2 


x 3 


9 


9 


9 




3 


4 


9 




15 


21 


45 




Table 9 



Similarly, for subset 3 shown in Table 10, a knowledge entity III (Table 1 1) is 
computed using a third computer. 
Subset 3: 

10 



X, 


x 2 


x 3 


4 


4 


8 


3 


5 


7 



20 

Table 10 





x, 


x 2 






2 


2 


2 


Xi 


7 


7 


7 




7 


9 


15 




25 


31 


53 




2 


2 


2 




9 


9 


9 




7 


9 


15 




31 


41 


67 




2 


2 


2 


x 3 


15 


15 


15 




7 


9 


15 




53 


67 


113 




Table 11 



Step 2: Calculate a knowledge entity IV (Table 12) by adding tog 
three previously calculated knowledge tables using a fourth computer. 





X, 


x 2 


x 3 




6 


6 


6 


X, 


15 


15 


15 




15 


20 


36 




43 


56 


99 




6 


6 


6 



21 



x 2 


20 
15 
56 


20 
20 
76 


20 
36 
131 




6 






6 


x 3 


36 




36 


36 




15 




20 


36 




99 




131 


232 



Table 12 



Step 3: Calculate the covariance matrix from knowledge entity 4 using the 
following equation. If / -j the covarianee is the variance. Each of the terms used in 
the covariance matrix are available from the composite knowledge entity shown m 

Table 12. 

~Xj 



Covanj = 



Table 13 



The resulting covariance matrix from Table 12 is set out below at 





x, 


x 2 


x 3 


X, 


0.916666667 


1 


1.5 


x 2 


1 


1.555555556 


1.833333333 


x 3 


1.5 


1.833333333 


2.666666667 



Table 14 
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Step 4: Calculate the correlation matrix from the covariance matrix using 
following equation. 





Xj 




Cbvaiy 

& ij i 

V Van Varj 

where: 

Van = Covaru 
Varj = Covarjj 



Table 15 
Correlation matrix: 





Xt 


x 2 


x 3 




1 


0.837435789 


0.959403224 


x 2 


0.837435789 


1 


0.900148797 


x 3 


0.959403224 


0.900148797 


1 



Table 16 



Step 5: Select the dependent variable^ (X 3 ) and then slice the correlation 
matrix to a matrix for the independent variables R u and a vector for the dependent 
variable R yj . Calculate the population coefficient /? y for independent variables Xj 
using the relationship. 

Uj= R~' ijRyj 
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From Table 16, a dependent variable correlation vector R yj is obtained 
shown in Table 17. 

~X~3 

0.959403224 
0.900148797 
Table 17 



5 Similarly, the independent variables correlation matrix Ry and its inverse 

matrix R? for X, and X 2 is obtained from Table 16 as set forth below at Tables 18 
and 19 respectively. 





Xi 


x 2 


X, 


1 


0.837435789 


x 2 


0.837435789 


1 
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X, 


x 2 


X, 


3.347826087 


-2.803589382 


x 2 


-2.803589382 


3.347826087 


Table 1 
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Calculate □ vector for Table 17 and 19 to obtain: 



□ 



0.68826753 



0.32376893 



Table 20 



15 
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Step 6: Calculate sample coefficients bj 

bj = Uj(Sy/Sj) 

Sy is the sample standard deviation of dependent variable X 3 and sj the sample 
standard deviation of independent variables (X,, X 2 )which can be easily calculated 
5 from the knowledge entity 46. 

b, = 0.68826753 * (1.788854382 * 1.048808848) = 1.173913043 = 1.174 
b 2 = 0.32376893 * (1.788854382 * 1.366260102)= 0.423913043 -0.424 

10 Step 7: Calculate intercept a from the following equation (Y is X 3 in our 

example): 

a=Y- bjXj - biX 2 - ... - bjc n 

where any mean value can be calculated from □ Xi / Nu 

15 

a = 6 - (1.174 * 2.5) - (0.424 * 3.3333) = 1.652173913 = 1.652 

Step 8: Finally the linear equation which can be used for the prediction. 

20 X3 = 1.652 + 1.174 *Xj + 0.424 *X 2 

which will be recognised as the same equation calculated from whole dataset. 

The above examples have used a linear regression model. Using the 
knowledge entity 46, the analytical engine can also develop intelligent versions of 
25 other models, including, but not limited to, non-linear regression, linear classification, 



non 
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linear classification, robust Bayesian classification, naive Bayesian classification, 
Markov chains, hidden Markov models, principal component analysis, principal 
component regression, partial least squares, and decision tree. 

An example of each of these will be provided, utilising the data obtained from 
the process of Figure 1. Again, it will be recognised that this procedure is not process 
dependent but may be used with any set of data. 

LINEAR CLASSIFICATION 

As mentioned above, effective scenario testing depends upon being able to 
examine a wide variety of mathematical models to see future possibilities and assess 
relationships amongst variables while examining how well the existing data is 
explained and how well new results can be predicted. The analytical engine enables 
provides an extremely effective method for accomplishing scenario testing. One 
important attribute is that it enables many different modeling methods to be examined 
including some that involve qualitative (categorical) as well as quantitative 
(numerical) quantities. Classification is used when the output (dependent) variable is 
a categorical variable. Categorical variables can take on distinct values, such as 
colours (red, green, blue) or sizes (small, medium, large). In the embodiment of the 
dryer 10, a filter may be provided in the vent 20, and optionally removed. A 
categorical variable for the filter has possible values "on" and "off reflective of the 
status of the filter. Suppose the dependent variable X i has k values. Instead of just one 
regression model we build k models by using the same steps as set out above with 
reference to a model using linear regression . 

Xu = a, + bnXi + b 2 iX 2 + •••+ b„,X„ 
X i2 = a 2 + bnXi + b 22 X 2 +...+ b n2 X„ 



X ik = a k + bi k Xi + b 2k X 2 + ...+ b„ k X n 
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In the prediction phase, each of the models for X u , . • ., X ik is used to construct 
an estimate corresponding to each of the k possible values. The k models compete 
with each other and the model with the highest value will be the winner, and 
determines the predicted one of the k possible values. Using the following equation 
5 will transform the actual value to probability. 

P (X ik ) = 1 / (1 + exp (-Xk)) 

Suppose we have a model with two variables (X,, X 2 ) and X 2 is a categorical 
10 variable with values (A, B). In the example of the dryer, A corresponds to the filter 
being on, and B corresponds to the filter being off. The knowledge entity 46 for this 
model is going to have one column/row for any categorical value (X 2A , X 2B ) 
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X 2A = a A + binXi 
X 2 b -a B + b iB Xi 



Table 21 shows a knowledge entity 46 with a categorical variable X 2 . 





x, 


x 2 


Xi 


X 2 A 


X 2 B 






N„ 


N, 2A 


N, 2B 






UX, 


UX, 


UX, 


X, 


X, 


ux, 


ux 2A 


UX 2B 






ux,x, 


ux,x 2A 


U X, X 2B 
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NlAl 
*■ * I A. 1 


N2A2A 


N 2A2 B 






UX 2 A 


ax 2A 


ux 2A 




x 2 


UX, 


ux 2A 


□ X 2 B 




A 


□ x 2A x, 


□ X 2 A X 2 A 


□ X 2 A X 2 B 


^2 














N?RI 


N 2 B2A 


N 2B2 B 






ux 2B 


ux 2B 


ux 2B 






ux, 


UX 2A 


UX 2B 




B 


ux 2B x, 


□ X 2B X 2A 


□ X 2 B X 2B 



Table 21 



Table 22 shows a knowledge entity 46 for 





x, 


x 2 


x, 


x 2A 






Nj, 


N, 2A 






UX, 


UX, 


Xj 


x, 


UX, 


ux 2A 






ux,x, 


u X, X 2A 
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■» * Z A zA 






ux 2A 


nx 2A 


Xi 


x 2 


DXj 


ux 2A 




A 


□ x 2A x, 


□ X 2 A X 2A 



Table 22 



Table 23 shows a knowledge entity 46 for 







X, 


x 2 






x. 


X 2 B 






Nj, 


N l2B 






□ X/ \ 


ux, 


A) 


Xj 


aXj 
ux,x, 


ox 2B 

□ X, X 2B 






N 2BI 


N 2B2B 






ox 2B 


ux 2B 


x 2 


x 2 


DXj 


ux 2B 




B 


UX 2B X, 


□ x 2B x 2B 



5 Table 23 
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The knowledge entity 46 shown in Tables 22 and 23 may then be applied to 
model each value of the categorical variable X 2 . Prediction of the categorical variable 
is then performed by predicting a score for each possible value. The possible value 
with the highest score is chosen as the value of the categorical variable. The 
5 analytical engine thus enables the development of models which involve categorical 
as well as numerical variables 

NON-LINEAR REGRESSION AND CLASSIFICATION 

1 0 The analytical engine is not limited to the generation of linear mathematical 

models. If the appropriate model is non-linear, then the knowledge entity shown in 
Figure 3 is also used. The combinations used in the table are sufficient to compute the 
non-linear regression. 

1 5 The method of Figure 7 showed how to expand the knowledge entity 46 to 

include additional variables. This feature also allows the construction of non-linear 
regression or classification models. It is noted that non-linearity is about variables not 
coefficients. Suppose we have a linear model with two variables (X/, X 2 ) but we 
believe Log (X,) could give us a better result. The only thing we need to do is to 

20 follow the three steps for adding a new variable. Log {X,) will be the third variable in 
the knowledge entity 46 and a regression model can be constructed in the explained 
steps. If we do not need X, anymore it can be removed by using the contraction 
feature described above. 





x, 


x 2 


X 3 = Log (X,) 




Nj, 


N l2 


Nl3 




ux, 


UX, 


UX, 


x, 


ux, 


UX 2 


ux 3 
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nx,x, 


UX1X2 


ux,x 3 






N22 


N 23 




0X 2 


ox 2 


ux 2 


x 2 


□ X; 


ux 2 


0X3 




UX 2 X, 


UX2X2 


UX 2 X 3 




N31 


N 32 


N 33 




ux 3 


ux 3 


ox 3 


x 3 


ux, 


ux 2 


ox 3 




DX 3 X, 


QX 3 X 2 


DX 3 X 3 


Table 24 
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Once the knowledge entity 46 has been constructed, the learner 44 can acquire 
data as shown in Figure 7. The new variable X 3 notionally represents a new sensor 
which measures the logarithm of X,. However, values of the new variable X 3 may be 
computed from values of X, by a processor rather than by a special sensor. Regardless 
of how the values are obtained, the learner 44 builds the knowledge entity 46. Then 
the modeller 48 determines a linear regression of the three variables Xi, X 2 , X 3 , where 
X 3 is a non-linear function of X,. It will therefore be recognised that operation of the 
controller 40 is similar for the non-linear regression when the variables are regarded 
as Xi, X 2 , and X 3 . The predictor 50 can use a model such as X ? = a + bi Xi + b 3 X 3 to 
predict variables such as X 2 . 



DIMENSION REDUCTION 



15 



10 
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As stated earlier, reducing the number of variables in a model is termed 
"dimension reduction". Dimension reduction can be done by deleting a variable. As 
shown earlier, using the knowledge entity the analytical engine easily accommodates 
this without using the whole database and a tedious re-calibration or re-training step. 
Such dimension reduction can also be done by the analytical engine using the sum of 
two variables or the difference between two variables as a new variable. Again, the 
knowledge entity permits this step to be done expeditiously and makes extremely 
comprehensive testing of different combinations of variable practical, even with very 
large data sets. Suppose we have a knowledge entity with three variables but we want 
to decrease the dimension by adding two variables (X lt X 2 ). For example, the 
knowledge elements in the knowledge entity associated with the new variable X4 
which is the sum of two other variables, X, and X 2 are calculated as follows: 
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(1) X 4 =X t +X 2 

(2) £*4=Z<*i + *2> 

=Z*,+Z* 2 

= X^ X 3 + X*2*3 
(4) X*4*4 = + * 2 X*. +*2) 

= £*,*, +2£* 1 * 2 +X* 2 * 2 



Table 25 

This is a recursive process and can decrease a model with N dimensions to just 
to one dimension if it is needed. That is, a new variable X 5 can be defined as the sum 
of X 4 and X 3 . 



20 Alternatively, if we decide to accomplish the dimension reduction by 

subtracting the two variables, then the relevant knowledge elements for the new 
variable X 4 are: 
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(1) X 4 =X X -X 2 

(2) Z^4=Z(^-^) 

(4) 2^4=Z(^-^K^l-^2) 

Table 26 

The knowledge elements in the above tables can all be obtained from the 
knowledge elements in the original knowledge entity obtained from the original dati 
set. That is, the knowledge entity computed for the models without dimension 
reduction provides the information needed for construction of the knowledge entity 
the dimension reduced models. 

Now, returning to the example of Table 4 showing the output rates for three 
different dryers the knowledge entity for the sample dataset is: 





X, 


x 2 


x 3 




N,, = 6 


N, 2 = 6 


N, 3 =6 


x, 


UX,=15 


UX,=\5 


ax, =\5 




□ JO =15 


□ X 2 =20 


□ X 3 =36 




aX,X,=43 


UX,X 2 =56 


UX,X 3 =99 




N 2I = 6 


N 22 =6 


N 23 =6 


x 2 


□ X 2 =20 


□ X 2 =20 


a x 2 =20 




□ */=15 


UXi =20 


□ X 3 =36 




□ X 2 X, =56 


□ X 2 X 2 =76 


ax 2 x 3 =n\ 




N 3I = 6 


N 32 =6 


N 33 =6 


x 3 


UX 3 =36 


□ X 3 =36 


□ X 3 =36 




□ ^,=15 


□ X 2 =20 


□ X 3 =36 




D X 3 X, =99 


□ ^2=131 


□ XiX 3 =232 
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Table 27 



Table 27 has the same quantities as did Table 12. Tablel2 was calculated by 
combining the knowledge entities from data obtained from dividing the original data 
5 set into three portions (to illustrate distributed processing and parallel processing). 
The above knowledge entity was calculated from the original undivided dataset. 

Now, to show dimension reduction can be accomplished by means other than 
removal of a variable, the data set for variables X 4 and X 3 (where X 4 =Xi+X 2 ) is: 

10 



X 4 =Xi+X 2 


x 3 


5 


5 


7 


7 


2 


3 


5 


6 


8 


8 


8 


7 



Table 28 



The knowledge entity for the X4, X 3 data set above is: 





x 4 


x 3 


x 4 


N 44 = 6 
UX 4 =35 
□ X 4 =35 
UX 4 X 4 =231 


N 43 =6 

□ X 4 =35 

□ X 3 =36 
UX4X3 =230 




N 34 =6 


N 33 =6 
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X 3 


□ X 3 =36 


OX3 =36 




□ X 4 =35 


□ X 3 =36 




□ X 3 X 4 =230 


OXjXj =232 



Table 29 



Note that exactly the same knowledge entity can be obtained from the knowledge 
entity for all three variables and the use of the expressions in Table 25 above. 

5 





x 4 


x 3 




N44=6 


N 43 =6 




□ X 4 =15+20 =35 


□ X 4 =15+20 =35 


x 4 


□ X 4 =15+20 =35 


□ X 3 =36 




□ X 4 X 4 =43+(2*56)+76 =231 


UX 4 X 3 =99+131=230 




N 34 =6 


N 33 = 6 




UX 3 =36 


UX 3 =36 


X3 


□ X 4 =15+20 =35 


□ X 3 =36 




UX3X4 =99+131=230 


UX3X3 =232 




Table 30 



DYNAMIC QUERIES 

10 The analytical engine can also enable "dynamic queries" to select one or more 

sequences of a series of questions based on answers given to the questions so as to 
rapidly converge on one or more outcomes. The Analytical Engine can be used with 
different models to derive the "next best question" in the dynamic query. Two of the 
most important are regression models and classification models. For example, 

1 5 regression models can be used by obtaining the correlation matrix from the 
knowledge entity 
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The Correlation Matrix: 

Then, the following steps are carried out: 

Step 1: Calculate the covariance matrix. (Note: if i = j the covariance is the 

variance.) 



Xi 





X, 








x„ 

5 


x, 


rn 








Tin 


... 


... 




... 






Xi 


ni 








r in 
















r m i 




I'm j 




I'm n 



Xj 



Z x < x j- N 



Covar 0 = 



Table 31 



Table 32 



10 Step 2: Calculate the correlation matrix from the covariance matrix. (Note: if i -j the 
elements of the matrix are unity.) 



Xj 
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Covar i} 
r ' J ^Var, x Varj 

where: 

Vari = Covar u 
Varj = Covarjj 







Table 33 



Once these steps are completed the Analytical Engine can supply the "next 
best question" in a dynamic query as follows: 

1 . Select the dependent variable X d . 

2. Select an independent Xi with the highest correlation to X d . If Xj has already been 
selected, select the next best one. 

3. Continue till there is no independent variables or some criteria has been met (e.g., 
no significance change in R2). 

Classification methods can also be used by the Analytical Engine to supply the 
next best question. The analytical engine selects the variable to be examined next (the 
"next best question") in order to obtain the maximum impact on the target probability 
(e.g. probability of default in credit assessment). The user can decide at what point to 
stop asking questions by examining that probability. 

The general structure of this Knowledge Entity for using classification for 
dynamic query is 
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X, 








x„ 


x, 


Nn 




Nij 




N,„ 














Xi 


Nu 




Ntj 












. . . 








N mI 




Nmj 




N mn 



Table 34 
where the ... are "ditto" marks. 

The analytical engine uses this knowledge entity as follows: 

1. Calculate 7}=D N v (i=l ...m ;j=l...n) 

2. Select X c (column variables, c=l...n) with the highest T. If X c has already beei 
selected, select the next best one. 

3. Calculate S, = S, x (N ic / N ti ) or S t = S, x (W« c / □ for all variables 0=i ....m) 

4. Select (row variables, r=l ...m) with the highest 5. If X r has already bee 
selected, select the next best one. 

5. Select Rule Out (Exclude) or Rule In (Include) strategy 

a. Rule Out: calculate 7} = N rJ /N rr for all variables where X r <>Xj (/=/...«) 
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b. Rule In: calculate 7} = N rJ /ONy for all variables where X r <>Xj (j=l...n ; 
i=l...m) 

6. Go to step 2 and repeat steps 2 through 5 until the desired target probability is 
reached or exceeded. 



NORMALIZED KNOWLEDGE ENTITY 

Some embodiments preferably employ particular forms of the knowledge 
entity. For example, if the knowledge elements are normalized the performance of 
some modeling methods can be improved. A normalized knowledge entity can be 
expressed in terms of well known statistical quantities termed "Z" values. To do this, 

□ Xi, UXiXj, Qand Dean be extracted from the un-normalized knowledge entity and 
used as shown below: Then, returning again to the three dryer data of Table 4 
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(1) Z i = 



(2) 



z*,-z*, _ 0 

(3) Z Z < Z ,=Z 

=Z 



v 



a i a j 



Z^y -"jI.*. -m.T. x j + ^r 2 



a <7 

i j 



where : 



Mi = 



Z*< 



. Mj = 



N, 



z^ 

N. 



cr, = 



AT. 



cr, = 



■ 



N , 



N, 



Table 35 



The un-normalized knowledge entity was given in Table 12. and the 
5 normalized one is provided below. 



40 

NORMALIZED KNOWLEDGE ENTITY FOR THE SAMPLE DATASET: 







^2 


z? 




N„ = 6 


AT/2 =6 


AT/5 =6 




□ Z, =0 


□ Z,=0 


□ Z/=0 




□ Z/=0 1 


□ z 2 =o 


□ Z 5 =0 




□ Z/Z/ =6 


r— i v 7 — c HO A A 1 ^ 
U Zy Z 2 — ^.UZhOIj 


n Z;Z? =5 756419 




N 21 = 6 


Af 2 2=6 


^23= 6 


z 2 


□ z 2 =o 


□ Z 2 =0 


□ z 2 =0 




□ Z/=0 


□ Z/=0 


□ z 3 =o 




□ Z 2 Zi =5.024615 


U Z2Z2 —0 


n 7o Z j =5 400893 




N 3 , = 6 


Afo=6 


A^ = 6 


z 3 


□ Zj=0 


□ z 3 =o 


□ Z 5 =0 




□ Zy=0 


□ Z 2 =0 


□ Zj=0 




□ Z 3 Zi =5.756419 


□ Z 3 Z 2 =5 .400893 


□ ZjZj =6 




Table 36 



SERIALIZED KNOWLEDGE ENTITY 

It is also possible to serialize and disperse the knowledge entity to facilitate 
some software applications. 

The general structure of the knowledge entity: 





X, 




Xj 




X tt 


x, 


W u 
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x, 


Wu 




w u 




Win 














X,„ 


w ml 




w mj 



















Table 37 



can be written as the serialized and dispersed structure: 



Xi 




W,, 


X, 


Xj 


w u 


X, 


x„ 


W ln 








x, 


x, 


Wu 


Xi 


Xj 


Wij 


Xi 


x„ 


W in 


• a • 


• • • 




X m 


Xj 


w ml 


X m 


Xj 


W mJ 


X m 


x„ 





Table 38 



5 then the knowledge entity for the three dryer data (Table 4) used above becomes: 



X, 


X, 


N,, = 6 


UX,=\5 


UX,=\5 


DXjXi=43 


X, 


x 2 


N 12 =e 


UXi =15 


□ X 2 =20 


UX,X 2 =56 
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X, 


x 3 


N l3 =6 


□ =15 


□ JO =36 


□ XiX 3 =yy 


x 2 


x 2 


N 2 2=6 


□ X 2 =20 


□ X 2 =20 


□ X 2 ^2 =76 


x 2 


X3 


iV23 u 


□ X 2 =20 


□ ^3 =36 


□ X 2 ^3 

=131 


x 3 


x 3 


#33=6 
- 1 rr„ui rt a' 


□ X 3 =36 

0 


□ X 3 =36 


□ XiX 3 =232 



ROBUST BAYESIAN CLASSIFICATION 

In some cases, the appropriate model for classification of a categorical 
variable may be Robust Bayesian Classification, which is based on Bayes >, rule of 
conditional probability: 



P(x\Cy)Pm 

Where: 

P{C k |x) is the conditional probability of G given x 
P(x|Q) is the conditional probability of x given C k 



P(C k ) is the prior probability of C k 
P(x) is the prior probability of x 



Bayes's rule can be summarized in this simple 



posterior = 
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likelihood x prior 
normalization factor 



A discriminant function may be based on Bayes's rule for each value k of a 
categorical variable Y: 

5 

y Jt (x) = ln/'(x|C J 0 + b^(C J 0 

If each of the class-conditional density functions P(x\C k ) is taken to be an 
independent normal distribution, then we have: 

10 

y fc (x)=-y 2 (x-/iO T ^(x-//jO-^ln|S Jt | + lnP(C^ 



There are three elements, which the analytical engine needs to extract from the 
knowledge entity 46, namely, the mean vector (□ k ), the covariance matrix (□*), and 
1 5 the prior probability of C k {P(C k )). 

There are five steps to create the discriminant equation: 
Stepl: Slice out the knowledge entity 46 for any C k where Q is &X X r . 
Step2: Create the □ vector by simply using two elements in the knowledge 
20 entity 46 □ X and TV where □ = □ X /N 

Step3: Create the the covariance matrix (□*), by using four basic elements in 
the knowledge entity 46 as follows : 



25 



Step4: Calculate the P(C k ) by using two elements in the knowledge entity 46 
□ X and N. If C k = X t then 
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P{X l )^UX i /N ii S 

Step 5 k discriminant functions 

In the prediction phase these k models compete with each other and the model 
5 with the highest value will be the winner. 

NAIVE BAYESIAN CLASSIFICATION 

It may be desirable to use a simplification of Bayesian Classification when the 
10 variables are independent. This simplification is called Naive Bayesian Classification 
and also uses Bayes 's rule of conditional probability: 



P{x\CdP(Cd 

P(C k \x) = — 

P(x) 



Where: 

1 5 P(C k |x) is the conditional probability of C k given x 

P(x\Ck) is the conditional probability of x given Cu 
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P{Ck) is the prior probability of C* 



P(x) is the prior probability of x 



When the variables are independent, Bayes's rule may be written as follows 
P{C k |x) = P(x 2 \C$ x P{x 2 \C$ x P{x s \C& x ... x PC^ICjO x 

25 It is noted that P(x) is a normalization factor. 



There are five steps to create the discriminant equation: 
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Stepl: Select a row of the knowledge entity 46 for any C k and suppose C* = * 

Step2a: If Xj is a value for a categorical variable A} we have P(xj | = □ Xj 
/□ A7 . We get □ Xj from W u and □ A) from W t ,-. 

Step2b: If x, is a value for a numerical variable AT, we calculate P(xj \ X t ) by 

using a density function like this: 



/(*) = 



1 



2 



V2TCC 



s 2a 



2 



Where: 

□ = □ Xi /Nu 

□ ,• = sqrt(Covar u ) 

Step3: Calculate the by using two elements in the knowledge entity 46 

□ X and iV. If C* = Ai then 
P(Xb=UXt/Nu 
Step4: Calculate P(C A |x) using 

|x) = />fci|C0 x />fca|CS3 x />(x 3 |C£ x ... x P(x s |C^ x 

In the prediction phase these k models compete with each other and the model 
with the highest value will be the winner. 



MARKOV CHAIN 

Another possible model is a Markov Chain, which is particularly expedient for 
situations where observed values can be regarded as "states." In a conventional 
Markov Chain, each successive state depends only on the state immediately before it. 
The Markov Chain can be used to predict future states. 

Let AT be a set of states (X , X 2 , X 3 . . . X n ) and S be a sequence of random 
variables (So, S U S 2 ... S,) each with sample space X If the probability of transition 
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from state X to Xj depends only on state X and not to the previous states then the 
process is said to be a Markov chain. A time independent Markov ehain is called a 
stationary Markov chain. A stationary 



Markov chain can be described by an Nby N 



transition matrix, T, where N is the state space 



and with entries Ty= ?(Sk=X \ S k -i-Xj). 



In a ** order Markov chain, the distribution of S k depends only on the k 
variables immediately preceding it. In a I s ' order Markov chain, for example, the 
distribution of S* depends only on the S k ,. The transition matrix T v for a I s order 
Markov chain is the same as N, in the knowledge entity 46. Table 40 shows the 
transition matrix T for a 1 st order Markov chain extracted from the knowledge enUty 
46. 





X, 








X n 


X, 


N„ 




Njj 




N ln 














Xi 


Nil 




Nij 




N in 














X n 


Nnl 
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One weakness of a Markov chain is its unidirectionality which means S k 
depends just on S, t not S k+ , Using the knowledge entity 46 can solve this problem 
and even give more flexibility to standard Markov chains. A 1 st order Markov cham 
with a simple graph with two nodes (variables) and a connection as shown in Figure 



10. 
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Suppose X, and X 2 have two states 
be of the form shown in Table 41 . 
A 



A and B then the knowledge entity 46 will 







x, 


x 2 






XlA 


XlB 


X2A 


X2B 




Xja 


WlMA 


WlAlB 


XT/ 

VYIA2A 


W IA2B 


Xj 


Xjb 


W IB1A 


W IBIB 


W IB2A 


WjB2B 




X2A 


w 2 aia 


W 2 A1B 


W2A2A 


W2A2B 


x 2 


X2B 


W2BIA 


W2BIB 


W 2 B2A 

La.1 n /t 1 


W2B2B 



It is noted that W #A * B indicates the set of combinations of variables at the 
intersection of row #A and column *B. The use of the knowledge entity 46 produces a 
bi-directional Markov Chain. It will be recognised that each of the above operations 
relating to the knowledge entity 46 can be applied to the knowledge entity for the 
Markov Chain. It is also possible to have a Markov chain with a combination of 
different order in one knowledge entity 46 and also a continuous Markov chain. These 
Markov Chains may then be used to predict future states. 
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are 



HIDDEN MARKOV MODEL 

In a more sophisticated variant of the Markov Model, the states are hidden and 
observed through output or evidence nodes. The actual states cannot be directly 
observed, but the probability of a sequence of states given the output nodes may be 
obtained. 

A Hidden Markov Model (HMM) is a graphical model in the form of a chain. 
In a typical HMM there is a sequence of state or hidden nodes S with a set of states 
(X x , X z , X 3 ... X n ), the output or evidence nodes E a set of possible outputs (Y u Y 2 , Y> 
. . . Y n ), a transition probability matrix A for the hidden nodes and a emission 
probability matrix B for the output nodes as shown in Figure 1 1 . 

Table 42 shows a transition matrix A for a 1 st order Hidden Markov 
Model extracted from knowledge entity 46. 





x, 








X n 


X, 


Nji ! 




Njj 


















Xi 


Nu 




N tj 




N in 














x n 


N„, 








N nn 



Table 42 
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Table 43 shows a transition matrix B for a 1 st order Markov chain 
extracted from knowledge entity 46 





X, 


... 


Xj 




X n 


Yj 






Nij 


















Yi 






AT 




AT. 
Win 














Y n 


N nI 




N nJ 




N nn 



Table 43 

5 Each of the properties of the knowledge entity 46 can be applied to the 

standard Hidden Markov Model. In fact we can show a 1 st HMM with a simple graph 
with three nodes (variables) and two connections as shown in Figure 12. 

Suppose Xi and X 2 have two states (values) A and B and X 3 has another two 
10 values C and D then the knowledge entity 46 will be as shown in Table 44, which 
represents a 1 st order Hidden Markov Model. 





X, 


x 2 


X 3 


XlA 


X/B 


X2A 


X2B 


X 3C 


X3D 


x, 


X/A 


WiAU 


Wjaib 


W lA 2A 


WlA2B 


WiA3C 


W IA 3D 
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WiBIA 


WiBIB 


WiB2A 


W\B2B 


W/B3C 


WlB3D 


x 2 


X2A 


W2A1A 


W 2A 1B 


W 2A 2A 


W 2A 2B 


W 2 A3C 


W 2 A3D 




X2B 


W2B1A 


W 2 BIB 


W 2 B2A 


W 2 B2B 


W 2 B3C 


W 2 B3D 




X$c 


W 3 C1A 


WsClB 


W 3C 2A 


W 3 C2B 


W 3 C3C 


W 3 C3D 


X 3 


Xm 


W3DIA 


W 3D 1B 


W 3D 2A 


W 3D2 B 


W 3D3 c 


W3D3D 



Table 44 



The Hidden Markov Model can then be used to predict future states and to 
determine the probability of a sequence of states given the output and/or observed 
5 values. 



PRINCIPAL COMPONENT ANALYSIS 

Another commonly used model is Principal Component Analysis (PCA), 
10 which is used in certain types of analysis. Principal Component Analysis seeks to 
determine the most important independent variables. 



There are five steps to calculate principal components for a dataset. 
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Stepl: Compute the covariance or correlation matrix. 
Step2: Find its eigenvalues and eigenvectors. 
Step3- Sort the eigenvalues from large to small. 

Ste P 4: Name the ordered eigenvalues as • and the correspond 



the only prerequisites for PCA 



eigenvectors as v\, v 2 , v 3 , . . 

Step5: Select the k largest eigenvalues. 

The covariance matrix or correlation matrix are 
which are easily can be derived from knowledge entity 46. 

The Covariance matrix extracted from knowledge entity 46. 



Xj 



Covanj = 



Table 45 



The Correlation matrix. 



Xj 



Cbvanj 



!J V Van Varj 



where: 



VaTi = Covaru 
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Varj = Covarjj 



Table 46 



The principal components may then be used to provide an indication of the 
relative importance of the independent variables based on the covariance or 
correlation tables computed from the knowledge entity 46, without requiring re- . 
computation based on the entire collection of data. 

It will therefore be recognised that the controller 40 can switch among any of 
the above models, and the modeller 48 will be able to use the same knowledge entity 
46 for the new model. That is, the analytical engine can use the same knowledge 
entity for many modelling methods. There are many models in addition to the ones 
mentioned above that can be used by the analytical engine. For example, the OneR 
Classification Method , Linear Support Vector Machine and Linear Discriminant 
Analysis are all readily employed by this engine. Pertinent details are provided in the 
following paragraphs. 



The OneR Method 



The main goal in the OneR Method is to find the best independent (Xj ) 
variable which can explain the dependent variable (Xi ). If the dependent variable is 
categorical there are many ways that the analytical engine can find the best dependent 
variable (e.g. Bayes rule, Entropy, Chi2, and Gini index). All of these ways can 
emp ,oy the knowledge elements of the knowledge entity. If the dependent variable is 
numerical the correlation matrix (again, extracted from the knowledge entity) can be 
used by the analytical engine to find the best independent variable. Alternatively, the 
engine can transform the numerical variable to a categorical vanable by a 
discretization technique. 



Linear Support Vector Machine 



10 
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The Linear Support Vector Machine can be modeled by using the covariance 
matrix. As shown in [0079] the covariance matrix can easily be computed from the 
knowledge elements of the knowledge entity by the analytical engine. 

Linear Discriminant Analysis 

Linear Discriminant Analysis is a classification technique and can be modeled 
by the analytical engine using the covariance matrix. As shown in [0079] the 
covariance matrix can easily be computed from the knowledge elements of the 
knowledge entity. 

Model Diversity 



As evident above, use of the analytical engine with even a single knowledge 
entity can provide extremely rapid model development and great diversity in models. 
Such easily obtained diversity is highly desirable when seeking the most suitable 
model for a given purpose. In using the analytical engine, diversity originates both 

1 5 from the intelligent properties awarded to any single model (e.g. addition and 

removal of variables, dimension reduction) and the property that switching modelling 
methods does not require new computations on the entire database for a wide variety 
of modelling methods. Once provided with the models, there are many methods for 
determining which one is best ("model discrimination") or which prediction is best. 

20 The analytical engine makes model generation so comprehensive and easy that for the 
latter problem, if desired, several models can be tested and the prediction accepted 
can be the one which the majority of models support. 

It will be recognised that certain uses of the knowledge entity 46 by the 
25 analytical engine will typically use certain models. The following examples illustrate 
several areas where the above models can be used. It is noted that the knowledge 
entity 46 facilitates changing between each of the models for each of the following 
examples. 
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The above description of the invention has focused upon control of a process 
involving numerical values. As will be seen below, the underlying pnnciples are 
actually much more general in applicability than that. 

5 CONTROL OF A ROBOTIC ARM 

In this embodiment an amputee has been fitted with a robotic arm 200 as 
shown in Figure 9. The arm has an upper portion 202 and a forearm 204 connected 
by a joint 205. The movement of the robotic arm depend upon two sensors 206, 208, 
10 each of which generate a voltage based upon direction from the person's bran, One 
of these sensors 208 is termed "Biceps" and is for the upper muscle of the arm. The 
second 206 is termed "Triceps" and is for the lower muscle. The arm moves m 
response to these two signals and this movement has one of four possibilities: 
flexion 210 (the arm flexes), extension 210 (the arm extends), pronation 212 (the arm 
15 rotates downwards) and supination 212 (the arm rotates upwards). The usual way of 
relating movement to the sensor signals would be to gather a large amount of data on 
what movement corresponds to what sensor signals and to train a classification 
method with this data. The resulting relationship would then be used without 
modification to move the arm in response to the signals. The difficulty with this 
20 approach is its inflexibly. For example, with wear of parts in the arm the relationship 
determined from training may no longer be valid and a complete new retraining 
would be necessary. Other problems can include: the failure of one of the sensors or 
the need to add a third sensor. The knowledge entity 46 described above may be used 
by the analytical engine to develop a control of the arm divided into three steps: 
25 learner, modeller and predictor. The result is that control of the arm can then adapt to 
new situations as in the previous example. 

The previous example showed a situation where all the variables were numeric 
and linear regression was used following the learner. This example shows how the 
30 learner can employ categorical values and how it can work with a classification 
method. 

Exemplary data collected for use by the robotic arm is as follows: 
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Biceps 


Triceps 




13 


31 


Flexion 


14 


30 


Flexion 


10 


31 


Flexion 


90 


22 


Extension 


87 


19 


Extension 


65 


15 


Extension 


28 


16 


Pronation 


27 


12 


Pronation 


33 


11 


Pronation 


7? 


24 


Supination 


70 


36 


Supination 


58 


28 


Supination 



Table 47 

The record corresponding to the first measurement of 1 : 
follows using the set of combinations n ;j , £ X„ X x j > H X i X J 



below in Table 48. 

5 











Moi 


cement 






Biceps 


Triceps 


Flexion 


Extension 


Pronation 


Supination 






1 


1 


1 


1 


1 


1 




Biceps 


13 


13 


13 


13 


13 


13 






13 


31 


1 


0 


0 


0 






169 


403 


13 


0 


0 


0 

1 






1 


1 


1 


1 


1 
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Triceps 


31 


31 


31 


31 


31 


31 






13 


31 


1 


0 


0 


0 






403 


961 


31 


0 


0 


0 






1 


1 


1 


1 


1 


1 




Flexion 


1 


1 


1 


1 


1 


1 






13 


31 


1 


0 


0 


0 






13 


31 


1 


0 


0 


0 


Movement 


Extension 


1 


1 


1 


1 


1 


1 






0 


0 


0 


0 


0 


0 






13 


31 


1 


0 


0 


0 






0 


0 


0 


0 


0 


0 




Pronation 


1 


1 


1 


1 


1 


1 






0 


0 


0 


0 


0 


0 






13 


31 


1 


0 


0 


0 






0 


0 


0 


0 


0 


0 




Supination 


1 


1 


1 


1 


1 


1 






0 


0 


0 


0 


0 


0 






13 


31 


1 


0 


0 


0 






0 


0 


0 

T~l~1~ AO 


0 


0 


0 



Once records as shown in Table 48 have been learned by the learner 44 into 
the knowledge entity 46, the modeller 48 can construct appropriate models of various 
movements. The predictor can then compute the values of the four models: 

Flexion = a + bj * Biceps + b 2 * Triceps 

Extension = a + b, * Biceps + b 2 * Triceps 

Pronation = a + bi* Biceps + b 2 * Triceps 
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Supination = a + b, * Biceps + b 2 * Triceps 



When signals are received from the Biceps and Triceps sensors the four 
possible arm movements are calculated. The Movement with the highest value is the 
one which the arm implements. 

PREDICTION OF THE START CODON IN GENOMES 

Each DNA (deoxy-ribonucleic acid) molecule is a long chain of nucleotides of 
four different types, adenine (A), cytosine (C), thymine (T), and guanine (G). The 
linear ordering of the nucleotides determines the genetic information. The genome is 
the totality of DNA stored in chromosomes typical of each species and a gene is a part 
of DNA sequence which codes for a protein. Genes are expressed by transcription 
from DNA to mRNA followed by translation from mRNA to protein. mRNA 
(messenger ribonucleic acid) is chemically similar to DNA, with the exception that 
the base thymine is replaced with the base uracil (U). A typical gene consists of these 
functional parts: promoter -> start codon -> exon -> stop codon. The region 
immediately upstream from the gene is the promoter and there is a separate promoter 
for each gene. The promoter controls the transcription process in genes and the start 
codon is a triplet (usually ATG) where the translation starts. The exon is the coding 
portion of the gene and the start codon is a triplet where the translation stops. 
Prediction of the start codon from a measured length of DNA sequence may be 
performed by using the Markov Chain to calculate the probability of the whole 
sequence. That is, given a sequence s, and given a Markov chain M, the basic 
question to answer is, "What is the probability that the sequence s is generated by the 
Markov chain Ml The problems with the conventional Markov chain were described 
above. Here these problems can cause poor predictability because in fact, in genes the 
next state, not just the previous state, does affect the structure of the start codon. 
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ATTTCT AGG AGT ACC . 



x, 


x 2 


A 


T 


T 


T 5 


T 


C 


C 


T. 


T 


A 


A 


G 


G 


G 


G 


A 


A 


G 


G 


T 


T 


A 


A 


C 


C 


c 10 







Table 49 



Classic Markov Chain: 



Record 1: AT 



X 2 



Table 50 





Xi 




A 


c 


G 


T 


A 


0 


0 


0 


0 


C 


0 


0 


0 


0 


G 


0 


0 


0 


0 


T 


1 


0 


0 


0 



A Markov Chain stored in knowledge entity 46 is constructed as follows: 



15 
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The first Record 1 : 1, 0, 0, 0, 0, 0, 0, 1 is transformed to the table: 





X, 




p 






A 


c 


G 


T 


A 


c 


G 


T 








1 


1 


1 


1 


1 


1 


1 






1 
1 


1 


1 


1 


1 


1 


1 


1 




A 


1 


0 


0 


0 


0 


0 


0 


1 






1 


0 


0 


0 


0 


0 


0 


1 








1 


1 


1 


1 


1 


1 


1 




C 


o 


o 


0 


0 


0 


0 


0 


0 






1 


0 


0 


0 


0 


0 


0 


1 


Xj 




o 


o 


0 


0 


0 


0 


0 


0 






1 


1 


1 


1 


1 


1 


1 


1 




G 


o 


o 


0 


0 


0 


0 


0 


0 






1 


0 


0 


0 


0 


0 


0 


1 






o 


o 


0 


0 


0 


0 


0 


0 






1 


1 


1 


1 


1 


1 


1 


1 




T 


o 


o 


o 


0 


0 


0 


0 


0 






1 


0 


0 


0 


0 


0 


0 


1 






0 


0 


0 


0 


0 


0 


0 


0 

1 






1 


1 


1 


1 


1 


1 


1 


1 




A 


0 


0 


0 


0 


0 


0 


0 


0 






1 


0 


0 


0 


0 


0 


0 


1 






0 


0 


0 


0 


0 


0 


0 


0 
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1 


1 


1 1 


1 


1 


1 


1 


1 


c 


0 


0 


0 


0 


0 


0 


0 


0 




1 


0 


0 


0 


0 


0 


0 


1 




0 


0 


0 


0 


0 


0 


0 


0 




1 


1 


1 


1 


1 


1 


1 


1 


G 


0 


0 


0 


0 


0 


0 


0 


0 




1 


0 


0 


0 


0 


0 


0 


1 




0 


0 


0 


0 


0 


0 


0 


0 




1 


1 


1 


1 


1 


1 


1 


1 


T 


1 


1 


1 


1 


1 


1 


1 


1 




1 


0 


0 


0 


0 


u 




1 

1 




1 


0 


0 


0 


0 


0 


0 


1 



Table 51 



The knowledge entity 46 is built up by the analytical engine from records 
relating to each measurements. Controller 40 can then operate to determine the 
probability that a start codon is generated by the Markov Chain represented in the 
knowledge entity 46. 



SALES PREDICTION 
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The next embodiment shows that the model to be used with the learner in the 
analytical engine can be non-linear in the independent variable. In this embodiment 
sales from a business are to be related to the number of competitors' stores in the area, 
average age of the population in the area and the population of the area. The example 
shows that the presence of a non-linear variable can easily be accommodated by the 
method. Here, it was decided that the logarithm of the population should be used 
instead of simply the population. The knowledge entity is then formed as follows: 



61 



No. of 

Competitors 


Average Age 


Log 

(Population) 


CI 1 

Sales 


2 


40 


4.4 


850000 


2 


37 


4.4 


1100000 


3 


36 


4.3 


920000 


2 


31 


4.2 


950000 


1 


42 


4.6 


107000 











From the record: 2, 40, 4.4, 850000, the knowledge entity 46 is generated 
as set out below in Table 53. 





No. of 
Competitors 


Average Age 


1 r\(j 

(Population) 


Sales 




1 


1 


1 


1 


No. of 

Competitors 


2 


2 


2 


2 




2 


40 


4.4 


850000 




4 


80 


8.8 


1700000 




1 


1 


1 


1 


Average Age 


40 


40 


40 


40 




2 


40 


4.4 


850000 




80 


1600 


176 


34000000 




1 


1 


1 


1 


Log (Population) 


4.4 


4.4 


4.4 


4.4 




2 


40 


4.4 


850000 




8.8 


176 


19.36 


3740000 




1 


1 


1 


1 
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Sales 


850000 


850000 


850000 


850000 




2 


40 




850000 




1700000 


34000000 


3740000 


722500000000 



The sales are modelled using the relationship: 

Sales = a + b, * No. of Competitors + b 2 * Average Age + b 3 * Log 
5 (Population) 

The coefficients may then be derived from the knowledge entity 46 as 
described above. 

! 0 The ability to diagnose the cause of problems, whether in machines or human 

beings is an important application of the knowledge entity 46. 



DISEASE DIAGNOSIS 



15 In this part we want to use the analytical engine to predict a hemolytic disease 

of the newborn by means of three variables (sex, blood hemoglobin, and blood 
bilirubin). 



Newborn 


Sex 


Hemoglobin 


Bilirubin 


Survival 


Female 


18 


2.2 


Survival 


Male 


16 


4.1 


Death 


Female 


7.5 


6.7 


Death 


Male 


3.5 


4.2 











Table 54 



20 
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A knowledge entity for constructing a naive Bayesian classifier would be as 
follow (just for first and forth records): 

Record 1: Survival, Female, 18, 2.2 
Record 4: Death, Male, 3.5, 4.2 

There is a categorical value then we transform it to numerical one: 
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Record 1 (transformed): 1, 0, 1, 0, 18, 2.2 
Record 4:0, 1,0, 1,3.5,4.2 





Newborn 


Sex 








Survival 


Death 


Female 


Male 


Hemoglobin 


Bilirubin 




2 


2 


1 


1 


1 


1 


Survival 


1 


1 


1 


0 


18 


2.2 




1 


1 


1 


0 


324 


4.84 




2 


2 


1 


1 


1 


1 


Death 


1 


1 


0 


1 


3.5 


4.2 




1 


1 


0 


1 


12.25 


17.64 



As we can see this Knowledge entity is not orthogonal and uses three 
15 combinations of the variables (TV, □ X and □ X 2 ) which are enough to model a naive 
Bayesian classifier. The knowledge entity 46 may be used to predict survival or death 
using the Bayesian classification model described above. 

From the above examples, it will be recognised that the knowledge entity of 
20 Figure 3 may be applied in many different areas. A sampling of some areas of 
applicability follows. 
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BANKING AND CREDIT SCORING 

In banking and credit scoring applications, it is often necessary to determine 
the risk posed by a client, or other measures of relating to the clients finances. In 
banking and credit scoring, the following variables are often used. 

checking_status, duration, creditjiistory, purpose, credit_amount, 
savings_status, employment, installment_commitment, personal_status, 
other_parties, residence_since, property_magnitude, age, 
other_payment _plans, housing, existing_credits, job, num_dependents, 
ownjelephone, foreign_worker, credit_assessment. Dynamic query is 
particularly important in applications such as credit assessment where an 
applicant is waiting impatiently for a decision and the assessor has many 
of questions from which to choose. By having the analytical engine select 
the "next best question" the assessor can rapidly converge on a decision. 

BIOINFORMATICS AND PHARMACEUTICAL SOLUTIONS 

The example above showed gene prediction using Markov models. There are 
many other applications to bioinformatics and pharmaceuticals. 

20 In a microarray, the goal is to find a match between a known sequence and 

that of a disease. 

In drug discovery the goal is to determine the performance of drugs as a 
function of type of drug, characteristics of patients, etc. 



15 
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ECOMMERCE AND CRM 



Applications 
marketing. 

Fraud Detection 



to eCommerce and CRM include email analysis, response and 



In order to detect fraud on credit cards, the knowledge entity 46 would use 
variables such as number of credit card transactions, value of transactions, location of 
10 transaction, etc. 

HEALTH CARE AND HUMAN RESOURCES 

To perform diagnosis of the cause of abdominal pain uses approximately 1000 
15 different variables. 

In an application to the diagnosis of the presence of heart disease, the variables 

under consideration are: 

age, sex, chest pain type, resting blood pressure, blood cholesterol, 
2 0 • blood glucose, rest ekg, maximum heart rate, exercise induced angina, 

extent of narrowing of blood vessels in the heart 

PRIVACY AND SECURITY 

The areas of privacy and security often require image analysis, finger print 
25 analysis, and face analysis. Each of these areas typically involves many vanables 
relating to the image and to attempt to match images and find patterns. 



Retail 



5 



10 
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In the retail industry, the knowledge entity 46 may be used for inventory 

control, and 
sales prediction. 

SPORTS AND ENTERTAINMENT 

The knowledge entity 46 may be used by the analytical engine to collect 
information on sports events and predict the winner of a future sports event. 

The knowledge entity 46 may also be used as a coaching aid. 

In computer games, the knowledge entity 46 can manage the data required by 
the games artificial intelligence systems. 

STOCK AND INVESTMENT ANALYSIS AND PREDICTION 

15 By employing the knowledge entity 46, the analytical engine is particularly 

adept at handling areas like investment decision making, predicting stock price, where 
there is a large amount of data which is constantly updated as stock trades are made 
on the market. 

TELECOM, INSTRUMENTATION AND MACHINERY 

20 

The areas of telecom, instrumentation and machinery have many applications, 
such as diagnosing problems, and controlling robotics. 

TRAVEL 

25 Yet another application of the analytical engine employing the knowledge 

entity 46 is as a travel agent. The knowledge entity 46 can collect information about 
travel preferences, costs of trips, and types of vacations to make predictions related to 
the particular customer. 
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From the preceding examples, it will be recognised that the knowledge entity 
46 when used with the appropriate methods to form the analytical engine, has broad 
applicability in many environments. In some embodiments, the knowledge entity 46 
has much smaller storage requirements than that required for the equivalent amount of 
observed data. Some embodiments of the knowledge entity 46 use parallel processing 
to provide increases in the speed of computations. Some embodiments of the 
knowledge entity 46 allow models to be changed without re-computation.It will 
therefore be recognised that in various embodiments, the analytical engine provides 
an intelligent learning machine that can rapidly learn, predict, control, diagnose, 
interact, and co-operate in dynamic environments, including for example large 
quantities of data, and further provides a parallel processing and distributed 
processing capability. 



